xhtmlchardet
Basic character set detection for XML and HTML in Rust.
Minimum Supported Rust Version: 1.24.0
Example
use Cursor;
extern crate xhtmlchardet;
let text = b"<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?><channel><title>Example</title></channel>";
let mut text_cursor = new;
let detected_charsets: = detect.unwrap;
assert_eq!;
Rationale
I wrote a feed crawler that needed to determine the character set of fetched content so that it could be normalised to UTF-8. Initially I used the uchardet crate but I encountered some situations where it misdetected the charset. I collected all these edge cases together and built a test suite. Then I implemented this crate, which passes all of those tests. It uses a fairly naïve approach derived from section F of the XML specification.